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Abstract 

Coalescent histories provide lists of species tree branches on which gene tree coalescences can take place, 
and their enumerative properties assist in understanding the computational complexity of calculations central 
in the study of gene trees and species trees. Here, we solve an enumerative problem left open by Rosenberg 
{lEEE/ACM Transactions on Computational Biology and Bioinformatics 10: 1253-1262, 2013) concerning the 
number of coalescent histories for gene trees and species trees with a matching labeled topology that belongs to 
a generic caterpillar-like family. By bringing a generating function approach to the study of coalescent histories, 
we prove that for any caterpillar-like family with seed tree t, the sequence {hn)n>o describing the number of 
matching coalescent histories of the nth tree of the family grows asymptotically as a constant multiple of the 
Catalan numbers. Thus, /i„ ^ PtCn, where the asymptotic constant Pt > 0 depends on the shape of the seed 
tree t. The result extends a claim demonstrated only for seed trees with at most 8 taxa to arbitrary seed trees, 
expanding the set of cases for which detailed enumerative properties of coalescent histories can be determined. 
We introduce a procedure that computes from t the constant Pt as well as the algebraic expression for the 
generating function of the sequence {hn)n>o- 


1 Introduction 

Coalescent histories, mathematical structures representing combinatorially distinct ways in which a given gene 
tree can coalesce along the branches of a given species tree, are important in a variety of phylogenetic problems O 
[nmi]. They arise most prominently in characterizing the set of objects over which a sum is performed in a 
fundamental calculation for inference of species trees from information on multiple genetic loci, the evaluation 
of gene tree probabilities conditional on species trees [1]. 

Because of the appearance of coalescent histories in sets over which sums are computed, as well as in state 
spaces of certain phylogenetic Markov chains [SIEIIIU], solutions to enumerative problems involving coalescent 
histories contribute to an understanding of the computational complexity of phylogenetic calculations. A recur¬ 
sion for the number of coalescent histories for a given gene tree and species tree has been established [12], and 
several studies have reported exact numerical results and closed-form expressions for the number of coalescent 
histories for small trees and for specific types of trees of arbitrarily large size O m m [m m [H [E]. The latter 
computations have proceeded both by solving or deploying the recursion in specific cases [m la ns US] , as 
well as by identifying correspondences between coalescent histories and other combinatorial structures for which 
enumerative results have already been established mmm- 
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Figure 1: A caterpillar-like family of species trees (t^"^)n>o- For a seed tree t, by adding n > 0 branches each with 1 leaf, 
we obtain the nth tree of the family, If t has 2 taxa, then (t^”^)n>o is simply the caterpillar family. 


One class of gene trees and species trees of particular interest for enumeration of coalescent histories is the 
caterpillar-like families, trees that have a caterpillar shape, except that the caterpillar subtree with r taxa is 
replaced by a subtree of size r that is not necessarily a caterpillar subtree (Fig. [T|). For the simplest caterpillar-like 
family, the set of caterpillar trees themselves, if the gene tree and species tree have the same caterpillar labeled 
topology with n taxa, then the number of coalescent histories is a Catalan number, 
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For Tr-caterpillar-like families, in which the r-taxon subtree of an n-taxon caterpillar species tree is replaced 
by an r-taxon subtree (Fig. [1]), by employing the recursion method, Rosenberg m obtained the exact number 
of coalescent histories for all n, for each with r < 8, in the case that the gene tree and species tree have the 
same labeled topology. Rosenberg m argued that in each of these cases, as n —>■ oo, the number of coalescent 
histories is asymptotic to a constant multiple of the Catalan numbers. A proof of this result has been presented 
in full for each case with r < 5 laiiaiis], and by computer algebra for cases with r = 6, 7, and 8 [T^ . 

Here, using a substantially different approach that brings to studies of coalescent histories the methods of 
analytic combinatorics, we produce an enumeration result that covers caterpillar-like families in general. We 
show that the result of m applies to all caterpillar-like families, not only those for which has r < 8. That is, 
we demonstrate that for any T^, as n —oo, the number of coalescent histories in the Tr-caterpillar-like family is 
asymptotic to a constant multiple of the Catalan numbers. We describe a method for computing the constant 
and provide a symbolic tool for performing the computation. Finally, we discuss the results in terms of their 
impact in mathematical phylogenetics. 


2 Preliminaries 

2.1 Species trees and coalescent histories 

We consider binary rooted leaf-labeled species trees, taking a single arbitrary labeling (without loss of generality) 
to represent a given unlabeled species tree topology. We consider an arbitrarily labeled species tree and its 
unlabeled tree interchangeably, treating the labeling as implicit. 

We examine coalescent histories for the case in which gene trees and species trees have the same labeled 
topology t, terming a coalescent history in this case a matching coalescent history. To be a matching coalescent 
history, a mapping h from the internal nodes of t (viewed as the gene tree) to the branches of t (viewed as 
the species tree) must satisfy two conditions: (a) for each leaf x in t, if x descends from node k in t, then x 
descends from branch h{k) in f; (b) for each pair of internal nodes ki and k 2 in t, if k 2 descends from ki in t, then 
branch h{k 2 ) descends from or coincides with branch h{ki) in t. The definition of matching coalescent histories 
is illustrated in Figure [2j We henceforth consider only matching coalescent histories, treating “matching” as 
implicit in references to coalescent histories; we also refer simply to histories for short. 
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Figure 2: Matching coalescent histories. (A) A matching coalescent history. (B) A mapping from the internal nodes of a 
tree to its branches that does not satisfy condition (a). Leaf B is descended from node k but does not descend from branch 
h{k). (C) A mapping from the internal nodes of a tree to its internal branches that does not satisfy condition (b). Node ^2 
is descended from node ki, but branch h{k 2 ) is strictly ancestral to branch h{ki). 


2.2 Caterpillar-like families of species trees 

For a binary species tree t with at least 2 taxa, we denote by (t^"'^)n>o the caterpillar-like family generated by 
the seed tree t. This family is recursively defined by taking = t and letting be the tree obtained by 

appending and a single leaf to a same root (Fig. [I]). 

Our interest is in the number of matching coalescent histories of for n > 0, a quantity we denote by hn{t) 
or simply hn- We note that whereas m indexed trees by their numbers of taxa, here n represents the number 
of taxa appended above the root of the seed tree, so that if seed tree t has |t| taxa, then \t\ + n gives the number 
of taxa in 


2.3 Principles of analytic combinatorics 

We rely on techniques of analytic combinatorics [7] to obtain our enumerative results, and recall several key 
points. In general, an integer sequence {an)n>o can be associated with a formal power series A{z) = 
also termed the generating function of the integers a^. Considering z as a complex variable, typically in a 
neighborhood of 0, features of the function A(z) are related to the growth of the coefficients On- 

More precisely, generating functions, considered as complex functions, enable analyses of the asymptotic 
growth of the associated integer sequences through the analysis of their singularities in the complex plane. In 
particular, under suitable conditions, there exists a general correspondence between the singular expansion of a 
generating function A(z) near its dominant singularities—those nearest the origin—and the asymptotic behavior 
of the associated coefficients (Chapter VI of [7]). We make use of generating functions that near their unique 
dominant singularity can be described by means of the square root function, and for which theorems on singularity 
analysis of generating functions [7] consequently apply. 


2.4 Catalan numbers 


The Catalan sequence appears often in combinatorics laiHiiii] and features prominently in our analysis. Rewrit¬ 
ing eq. © with index n rather than n — 1, 
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The associated generating function is 


Ciz) = Y, = 
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By definition, if [z'^]f(z) denotes the nth term in the power series expansion of f{z) at z = 0, we have 

= [z'^]C{z) = - VT^) = ^lz^+^](-VT^). 


( 2 ) 


( 3 ) 
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Asymptotically, applying Stirling’s approximation n! ~ y/2Trn{n/e)^ to eq. m, the Catalan sequence satisfies 



3 The number of matching coalescent histories for caterpillar-like families 

Our goal is to produce a procedure that evaluates the number of coalescent histories for matching gene 

trees and species trees in the caterpillar-like family that begins with seed tree t, and moreover, to show that 

hn{t) ~ PtCn, ( 6 ) 

where the multiplier /3t > 0 for the Catalan sequence is a constant depending on t. In other words, we wish to 
demonstrate that as n ^ oo, the ratio hn/cn converges to a constant Pt > 0 that depends on the seed tree t. 

First, in Section EH we determine a lower bound for the number of matching coalescent histories of the reth 
tree of the caterpillar-like family with seed tree t. Next, in Section we introduce a concept of m-rooted 
histories of a species tree . The section provides an iterative construction of the rooted histories of from 

those of t^'^\ describing the construction by means of a convenient labeling scheme. We follow a commonly used 
combinatorial enumeration strategy mm that determines a recursive succession rule for successive collections 
of objects in a sequence and then uses this rule to compute a generating function. In Section 1,1..It we use the 
iterative construction to produce a bivariate generating function whose coefficients hn^m are the numbers of m- 
rooted histories for trees We next obtain the generating function for the integer sequence {hn)n>o describing 
the number of matching coalescent histories for the . Finally, using the lower bound from Section 1,1.11 in 
Section EH we apply methods of analytic combinatorics to study the asymptotic behavior of hn- 

3.1 Lower bound for 

To produce a lower bound for hn, we first define V as the tree with 2 taxa. Recalling that we index trees so that 
the number of taxa in a tree is n more than the number of taxa in the seed tree, we have [31 EH El] 

— Cn+l- 

A constructive procedure, illustrated in Figure El shows that for any seed tree t with \t\ >2, 


hn{t) > hnCV) = Cn+l- (7) 

For a seed tree t, we can superimpose V on t so that the root ry of V matches the root rt of t (Fig. EB)- The two 
leaves of V are identified with two of the leaves of t, one on each side of the root of t. Generating caterpillar-like 
families by adding n single branches separately to V and to t, the superposition of V on t extends, so that 
is superimposed on (Fig. EP)- The n caterpillar branches of and then correspond. 

Each matching coalescent history h of determines a corresponding matching coalescent history h' of 
by considering the restriction of the history h to the set of internal nodes of that correspond to internal 
nodes of (Fig. ED)- Thus, for any given seed tree t, the number of matching coalescent histories of is 
greater than or equal to the number of matching coalescent histories of . In symbols, we have eq. ©• 

3.2 Iterative generation of rooted histories 

This section describes the iterative procedure that for a seed tree t eventually enables us to determine a formula 
for hn- First, in Section [ 3 . 2.11 we discuss m-rooted histories, which extend the concept of matching coalescent 
histories, introducing an additional parameter m. Next, in Section 13 . 2.21 we examine the relationship between 
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Figure 3: Superposition of the caterpillar tree family on a caterpillar-like tree family with arbitrary seed tree of size |t| > 2. 
(A) A seed tree t and the seed tree V for the caterpillar family. (B) Superposition of V on t, so that the roots ry and rt 
overlap. (C) Superposition of (shaded internal nodes) on (shaded and unshaded nodes). The n = 2 caterpillar 
branches in and overlap, and ry still matches rj. (D) A matching coalescent history of (dashed and dotted 
arrows) determines a matching coalescent history of (dashed arrows) by ignoring arrows from the unshaded nodes. 


rooted histories and the extended coalescent histories of [12], importing results on extended coalescent histories 
into the more convenient framework of rooted histories. We expand our goal of enumerating matching coalescent 
histories for , considering a more general problem of enumerating for m > 1 the m-rooted histories of . 

In Section 13.2.31 we define an operator for constructing the rooted histories of from the rooted 

histories of Next, in Section (3.2.41 we introduce a labeling scheme that in Section (3.2.51 enables us to switch 
from counting rooted histories to counting multisets of labels. At the end of Section 13.21 we will have converted 
our enumeration problem into an enumeration that is more convenient for constructing a generating function. 

3.2.1 m-rooted histories 

Consider a tree t with |t| > 2, and suppose that the branch above the root of t (the root-branch) is divided into 
infinitely many components. A matching coalescent history mapping the internal nodes of t onto the branches 
of t is said to be m-rooted for m > 1 if the root of t is mapped exactly onto the mth component of the root 
(Fig. Uj). It is said to be rooted if it is m-rooted for some m. Branches are numbered so that branch m = 1 is 
immediately above the root node, and m is greater for components that are farther from the root. 

For a rooted history h of a tree t, m = m{h) denotes the component of the root-branch of t that receives the 
image of the root of t. Hn,m{t) denotes the set of m-rooted histories of and Hn{t) = Um=i denotes 

the set of its rooted histories. The number of m-rooted histories of t^'^^ is hn^m = \Hn,m\, and the number of 
1-rooted histories = hn,i is also the number of matching coalescent histories. Enumeration of the matching 
coalescent histories of is equivalent to enumeration of the 1-rooted histories of 

3.2.2 Rooted histories and extended histories 

Rooted histories are closely related to extended coalescent histories, as defined by [l2|. We use this relationship 
to study properties of rooted histories. Rosenberg [12] defined the set of k-extended coalescent histories of a tree 
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Figure 4: Rooted histories of a tree. (A) A 3-rooted history. The root-branch is divided into infinitely many components. 
The third component receives the image of the root. (B) A 1-rooted history. The number of 1-rooted histories corresponds 
to the number of matching coalescent histories of the tree. 


t with |t| > 1 for integers fe > 1; we also consider /c = 0 by setting the number of 0-extended histories to 0. 

A /c-extended history is defined as a coalescent history for a species tree whose root-branch is divided into 
exactly k > 0 parts. In other words, the root-branch has exactly k > 0 possible components onto which a 
A:-extended history can map the gene tree root. Here we consider matching fe-extended histories, so that the 
internal nodes of a tree t are mapped to the branches of t and its k components above the root. For convenience, 
we refer to extended histories by the index k, reserving the index m for rooted histories. 

By the definitions of /c-extended and m-rooted histories, for each k > 0, the set of /c-extended histories of a 
tree is exactly the set of all m-rooted histories with 1 < m < fc. Therefore, for a tree t with at least 2 leaves, if 
we label by et^k its number of /c-extended histories, then for each m > 1 the number of m-rooted histories of t is 

ho^m — (^t,m (8) 

Note that for m = 1, we explicitly use in eq. (HI) the fact that ctfi is defined and equal to 0. In addition to setting 
etfi = 0 for any tree t, as in [12] we also set et^k = 1 for all fc > 1 in the case that t has exactly 1 leaf. 

Suppose |f| > 1 and A: > 0. Denote by and tji the left and right subtrees of the root of t. We can compute 
et^k recursively as in Theorem 3.1 of |12j : 




0 if \t\ > 1 and A: = 0 

1 if |t| = 1 and k > 1 

E*=i if |A| > 2 and A: > 1. 


(9) 


As was already observed in the remarks following Corollary 3.2 of by eq. ([9|), for any tree t with |t| > 1, 
for positive integers A: > 1, the function f{k) = et^k is a polynomial in k. With our extension to permit A: = 0, 
we can extend this fact to A: > 0 for |t| > 2: for any tree t with |A| > 2, and for A: > 0, we claim that the function 
f{k) = et^k is a polynomial in k. Note that in allowing A: = 0, we claim et^k is a polynomial in k only for |t| > 2; 
for |t| = 1, et^k is not a polynomial in k because etfi = 0 and et^k = 1 for A; > 1. 

To prove the claim, fix t with |A| > 2 and consider the variable k over domain [l,oo). We demonstrate that 
f{k) is a polynomial in k for domain [0, oo) by showing that the closed-form polynomial for f{k) has a factor of 
k, so that our choice etfl = 0 in eq. ([9|) is compatible with the polynomial expression valid for A; > 1. 

Observe that for i>l, and et^^i are polynomials in f, say and Replacing the terms 

and that appear in the recursion in eq. ([9]) by the two polynomials Ptj^{i + 1) and Pt^i^ + 1), we obtain 


k 

i=\ 




i=l 


i=l 


( 10 ) 


where P'{i) denotes a polynomial in i that results from the product of Pt^ii + 1) and Pti^{i + !)• By Faulhaber’s 
formula for sums of powers of integers, symbolic sums of the form ^ fixed integer p > 0 are polynomials 
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containing a factor of k in their closed forms (Section 6.5 of [ 8 ]) —for example, = k‘^{k + 1)^/4. Thus, 

because the polynomial P'{i) is a linear combination of terms of the form P, the closed-form expression for the 
sum Yli=i appearing in eq. (fTOjl also has a factor of k. It therefore has a value of 0 at A: = 0. 

Functions et^k for trees t with 1 < |t| < 9 and k > 1 appear in Tables 1-4 of m- For |t| > 2, as we have 
shown, these example polynomials are divisible by the variable representing the number of components of the 
root-branch. By eq. ([ 8 ]), we immediately obtain the following result. 

Proposition 1 For any tree t with |t| > 2 and for m > 1, the number ho^rn of m-rooted histories of t is a 
polynomial in m that can be computed by the difference in eq. using et^k os in eq. 

As an example of Proposition [H consider the tree t = ((A, B), (C, D)), identifying this arbitrary labeling with 
the unlabeled tree (()())• By applying the recursive procedure in eq. Q, we hnd that for A; > 0, the number of 
A:-extended coalescent histories for t is et^k = \k{2k‘^ -|- 9A: -|- 13) [H]. The difference eq. ([ 8 ]) yields that for m > 1 
the number of m-rooted histories of t is /io,m = — ot,m-i = 'on? + 2m + 1 . 

3.2.3 Generating rooted histories of from rooted histories of 

This section introduces an operator 0 that generates the rooted histories of from those of For each 

rooted history h' of there exists exactly one rooted history h of with h' € Recalling the 

definitions of the sets and Hn{t) of m-rooted and rooted histories of t^'^\ we define as follows. 

Definition. Let V{X) = {x : x C X} denote the power set of set X, and fix tree t. The operator 14 is a function 

where for a given rooted history h G Hn{t), Q{h) is the set of rooted histories h' G Hn+i{f) for which the 
restriction of h' to excluding its most basal caterpillar branch coincides with the rooted history h of 

Denote by 6 i, 62 , • • •, bn+i the caterpillar branches in from the least basal 61 to the most basal bn+i 

(Fig. [ 5 ]). Upon removal of the most basal caterpillar branch 6n+i from the root of —to which branch 

bn+i is attached—is replaced by a demarcation between the first and second components of the root-branch of 
A*'"'). For instance, in Fig. [5]A, starting from tree A = ((A, R), (C,D)), we consider h'", a 3-rooted history of A*'^^ 
By removing the most basal caterpillar branch 63 of A^^), we reduce to the 1-rooted history h” of A^^^ (Fig. [5f3). 
Next, by removing the caterpillar branch 62 of t^‘^\ we reduce to the 2-rooted history h' of A^^^ (Fig. EP). By 
removing the remaining caterpillar branch bi from A^^^ we reduce to the 2-rooted history /i of A = A^^^ (Fig. [5p). 
Therefore, by the definition of D, we have h' G Q,{h),h" G Q{h'), and h'" G Q{h''). 

By definition, D has the property that for each rooted history h' G Hn+i{t), with n > 0, there exists exactly 
one rooted history h G Hn{t) such that h' G D(/i). In other words, for each n > 0, the set of rooted histories 
Hn+i{t) can be partitioned as a disjoint union, 

iL„+i(A)= y Q{h). (II) 

flGHn (t) 

The set Hn+i{t) is therefore generated without double occurrences of any rooted history by applying D to the 
rooted histories in Hn{f). It follows immediately that in performing n iterations of D to obtain D[... [D[I4(FAo)]] • ■ •] 
from the set Hq of rooted histories of A^^^, all the rooted histories of A^""^ are generated exactly once. 

3.2.4 Labels for rooted histories 

The operator D, starting from the rooted histories of A^”'\ generates the rooted histories of In this section, 

we introduce a labeling scheme, giving each m-rooted history h of a label L{h) = {n,m). We then describe 
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Figure 5: The relationships among rooted histories for sequential members of caterpillar-like families. For a rooted history 
h'" of with t = {{A, B), {C^ D)), the figure sequentially removes caterpillar branches. By definition, a rooted history 
h' of belongs to the set f2(/i) if, by removing the most basal caterpillar branch 5„_|_i in we recover the rooted 

history h of Note that when we remove the basal caterpillar branch b^+i from the root of —to which the 

branch bn+i is attached—becomes the boundary between the first and second components of the root-branch of and 
is depicted as a horizontal segment. (A) h'" G Vl{h"). (B) h" G n{h'). (C) h' G n(/i). (D) h. For each rooted history, the 
value of the parameter m, representing the component of the root-branch that receives the image of the root, is shown. 


how O acts on the labels of the rooted histories, characterizing the set of labels L[i}(h)] = {L(h') : h' G Q{h)}. Our 
goal is to represent each set Hn of rooted histories of by the multiset of its labels, reducing the enumeration 
of \Hn,m\ to the problem of counting certain ordered pairs (n, m) iteratively generated by simple rules that reflect 
how the rooted histories in Hn+i are generated according to rule 0 from the rooted histories in Hn by eq. m- 
In our labeling scheme, each rooted history h G Hn{t) that maps the root of onto the mth component 
of the root-branch of receives label L(h) = {n,m). The enumeration of hn = \Hn,i\ then reduces to the 
enumeration of those rooted histories labeled by (n, 1). 

Note that a label (re, rre) does not uniquely specify an rre-rooted history of a tree has in general many 
rre-rooted histories, each receiving the label {n,m). In other words, if /i,/i G Hn{t) and L{h) = L{h), then h and 
h are not necessarily the same rooted history of We will, however, consider for re > 0 multisets of labels in 
which we find a copy of the label {n,m) for each m-rooted history of 

To characterize how the operator acts on the labels for rooted histories, consider an m-rooted history 
h G Hn{t), so that h maps the root of onto the mth component of the root-branch of This history is 
labeled L{h) = (n,m). For instance, taking the seed tree t = {C,D)), the history h of t = t^^'i depicted 

in Figure [6]A is labeled L{h) = (0,3), whereas the history h of t^^'i in Figure EP is labeled L{h) = (1,1). 

By applying to a history h of with L{h) = (re, m), we produce a set of rooted histories Q{h) C Hn+i{t). 

The set of labels for Q{h), 

L[n{h)] = {L{h') : h' G n{h)}, 

is determined according to the rule: 

J {(re -|- 1, m') : m' > m} if m = 1 

\ {(re + 1, m') : m' > m — 1} if m > 2, 
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rooted histories of t 
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Figure 6: Generation of rooted histories of from rooted histories of as given by rule applied to seed tree 

t = {{A, B), {C, D)). To obtain rooted histories of (right) from rooted histories of (left), we choose the component 

m' of the root-branch of onto which the root of is mapped (solid arrows). The smallest among infinitely many 

possible choices are depicted. For all nodes of except the root, the rooted history generated for t("+i) coincides with 

the generating rooted history of (dashed arrows). (A) A case with m > 2. A 2-rooted history h of labeled (0,3), 

is shown. (B) fl(/i) for h in (A). 2-, 3-, and 4-rooted histories of belonging to fl(/i) are shown and are labeled (1,2), 
(1,3), and (1,4), respectively. Because m > 2, to' > to — 1 as in eq. (fT^ . (C) A case with to = 1. A 1-rooted history h of 
labeled (1,1), is shown. (D) D,{h) for h in (C). 1- and 2-rooted histories of belonging to Q{h) are shown and are 
labeled (2,1) and (2, 2), respectively. Because to = 1, to' > to. 
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( 0 , 2 ) ... ( 0 , 2 ) ( 0 , 3 ) ... ( 0 , 3 ) 

/ \ / \ / \ ' / \ ••• 
A2((0,2))\ /6((0,2))\ /f2((0.3))\ /6((0,3))\ 



Figure 7: Iterative application of a rule for generating the multiset of the labels of the rooted histories of a tree The 
iterative procedure starts with the multiset Lq that contains those labels of the form {(0,m) : m > 1} associated with the 
rooted histories of a seed tree t = In the first step of the iteration, we apply fl (eq. (fT3l) i to each label of Lq. In the 
second step, we apply fl to each label resulting from the first step, and so on. The number of m-rooted histories of 
corresponds to the number of labels (n, m), considered with their multiplicity, generated after the nth step of the iteration. 


where m' denotes the value of the parameter m —the component of the root-branch of to which the root 

is mapped—for the rooted histories h' E Q.{h) of 

The rule in eq. (HD distinguishes between two cases depending on whether the value of the parameter 
m = m{h) of the generating rooted history h is equal to or exceeds 1. In both cases, the set L[n(/i)] contains 
infinitely many labels, each with its first component equal to n -|- I, as the labels refer to rooted histories of 
^(n+i)_ value of the second component m! ranges in [m — l,oo) if m > 2, and in [l,oo) if m = 1. 

Recall that according to the definition of fl, from an m-rooted history h of (Fig. [6]A. and [DC), we generate 
an m'-rooted history h' E Q.{h) of (Fig. [6j3 and[6p) by (i) choosing the component m! of the root-branch 

of onto which h' maps the root of and (ii) letting h' coincide with h on all nodes of except 

the root. The rooted history h' coincides with h once we remove the most basal caterpillar branch of 

Figure El illustrates both cases of eq. (fT^ . In step (i), infinitely many choices of m' are possible, because 
the root-branch of is divided into infinitely many parts. The most basal caterpillar branch in is 

attached at the border between the first and second components of the root-branch of . Thus, the addition of 
the (n -|- l)st caterpillar branch eliminates a component of the root-branch, so that if the starting rooted history 
h has m > 2 (Fig. [6]A.), then the root of maps to component m — 1 of the root-branch of The root of 

^(»3+i) can map to this same branch, or to any branch m! with m' > m — 1. For instance, in Figure EJl, one of 
the rooted histories h' generated by a rooted history h with m = 3 has m' = m — 1 = 2. 

If h has m = 1, however, then production of h' is slightly different (Fig. ElC). By definition, the parameter 
m for a rooted history cannot be smaller than 1. The value m' = m — 1 is not permitted in this case, and m! 
remains greater than or equal to m = 1 (Fig. EP). 

3.2.5 From counting rooted histories to counting their labels 

The labeling scheme in Section [3.2.41 encodes the application of the operator to the rooted histories of 
Now that we have described the set of labels L[fl(/i)] arising from the label L{h) according to the rule in eq. (|12l) . 
the problem of counting a set of rooted histories becomes a problem of counting the set of the associated labels 
along with their multiplicities—or the multiset of the labels. 

For n > 0 and m > 1, we use n((n,m)) to denote, with an abuse of notation, the set of labels L[Q{h)] when 
L{h) = (n, m). Recalling that iterative application of II to the rooted histories Hq of tree t^'^ generates the rooted 
histories Hn of the enumeration of |77n,m| for tree t = t^^'> becomes a problem of counting those labels of the 
form (n,m) that are generated when we iteratively apply the operator as n[... [n[n(Lo)]] • • •] starting from 
the multiset of labels Lq = {L{h) : h E Hoit)} (Fig. [7|). 

Eq. (I12p characterizes the set of labels L[fl(/i)] of the rooted histories in n(/i) in terms of the label L{h) 
of rooted history h. If L{h) = then n((n,m)) denotes the set of labels L[Q{h)]. Thus, converting the 
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(13) 


notation from histories to labels, eq. m becomes 



{(n + 1, m!) : m' > m} 
{(n + 1, m!) ■. m' >m — 


if m = 1 
if m > 2. 


For the seed tree t, we count hn,m = \Hn,m\ by evaluating the number of occurrences of the ordered pair (n,m) 
in the multiset defined as 


Ln = L[Hn{t)] = {Lih) : h G 

In symbols, we have 

hn,m — |{^ £ Lyi . t — (?T,,77l)}|. 

By eq. m, each multiset is generated iteratively (Fig. [^. We start with the multiset of labels 


(14) 

(15) 


Lo = {L{h) : h € Ho{t)}. (16) 

For each re > 0, the multiset Ln+i is obtained as the union 

Ln+i= y 0((re,rre)), (17) 

{n,m)&Ln 

where the symbol IjJ denotes the union operator for multisets. Thus, in M = Mi (jj M2, if an element x appears rei 
times in Mi and re2 times in M2, then it appears rei + re2 times in M. Eq. (I17p provides an iterative generation of 
the labels for the rooted histories of from the labels of the rooted histories of Hn{t), retaining information 

about the multiplicity of occurrences of each label. 


3.3 Counting rooted histories with generating fnnctions 

We have now obtained eq. m, which gives an equivalence between the number of m-rooted histories of and 
the number of labels (re, m) in the multiset L„, and eqs. (jlGli and (jl7|] . which give through hi (eq. (I13p ) an iterative 
procedure that generates the family of multisets {Ln)n>o- In this section, we translate the iterative procedure 
into algebraic terms, determining the generating function associated with the integer sequence {hn)n>o- 

First, in Section [3.3.1l we characterize a generating function g{y) for the sequence (/io,m)m>i- Next, in Section 
13.3.21 we deduce an equation satisfied by the bivariate generating function F(y, z) for {hn,m)n>o,m>i- In Section 
13.3.31 we solve the equation, obtaining the desired generating function f{z) for the sequence {hn,i)n>o- This 
generating function can be written in turn as a function of g{y). 

3.3.1 Generating function for the seqnence (/io,m)m>i 

In this section, we characterize the generating function g{y) that counts for a given seed tree t the labels in the 
multiset Lq describing the labels of the rooted histories of t. 

Fix the seed tree t. Recalling the equivalence in eq. m, define the generating function 

00 

g{y)= Y. y'" = E (18) 

{0,m)eLo m=l 

the rreth coefficient of whose power series expansion provides the number of labels (0,rre) appearing in Lq. 
By Proposition [H ho^rn can be expressed as a polynomial in the variable m and can thus be decomposed as a 
finite linear combination of terms of the form m^, where A: is a non-negative integer. That is, for a certain finite 
set of non-negative integers with largest element K, 

K 

fc =0 
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where the Wk are constants. 

We introduce generating functions g^y^k, one for each k from 0 to K, in which the mth coefficient is m^: 

OO 

gy^k{y)=Y,m^y^. ( 20 ) 

m=l 

Because K is finite, the desired generating function g{y) can be written as a finite linear combination of this 
new collection of generating functions gjyfi {y),gm^ (y),, g^^K (y). More precisely, by substituting in eq. (fT8]i the 
polynomial in eq. (1191) and switching the order of summation, we obtain 


K 

g{y) = '^wkgmk{y)- 

k=0 


( 21 ) 


We now state a lemma that characterizes the generating functions g^^^k (y). 

Lemma 1 For each non-negative integer k from 0 to K, the generating funetion g^y^k (y) in eq. /fl^) is rational 
with denominator (1 — . That is, gyyyk{y) has the form 

, P{y) 

9m}=\y) 

where P{y) is a polynomial in y. 

Proof. We proceed by induction on k. If A: = 0, then by eq. (EQI), gmo{y) = 1/(1 - y) - 1 = y/(l - y)- Assume 
the inductive hypothesis for gmk{y). Applying eq. (|20|1 to gmi^+i{y), we can recover gmi^+iiy) as 


y^fc+i (y) = y 


(y) 

dy 


( 22 ) 


which by the quotient rule for derivatives is a rational function with denominator (1 — y)^"*"^. □ 

The proof of the lemma gives a recursive procedure in eq. (I2^ to compute the functions y^fc(y). By eq. ([2T]) . 
we immediately obtain from the lemma a result about the generating function y(y). 


Proposition 2 The generating function g{y) whose mth eoeffieient [y™']y(y) is the number of m-rooted histories 
ho,m of a seed tree t can be written as a finite linear combination 


,7 

y(y) = 

7=1 


(1-y)'’’ 


where b > 1 and J > 1 are positive integers, each aj is a non-negative integer, and the qj are constants. 


(23) 


As an example, we show how the procedure in Proposition [2] can be applied to determine the generating 
function y(y) for t = {C,D)), the same example seed tree for which we computed the polynomial /io,m 

via Proposition [U Recall from Section [3.2.21 that /io,m = iTofi + 2m + 1. To obtain the generating function y(y) 
that has coefficients [y”*]y(y) = m? + 2m + 1, we sum generating functions for the monomials mf, 2m, and 1. 
We already know y^o (y), and by applying eq. (1221) , we have 

ymo(y) 
ymi (y) 
ym2 (y) 
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y 

1 -y 

dQmP (y) ^ y 
^ dy (i-y)^ 

d9m^ (y) ^ y(y +1) 
^ dy (i-y)^' 












Thus, 


9{v) = gmo{y) + ‘^9m^{y) + 9m^{y) = ^ (24) 

In eq. (f2^ . g{y) is written as in eq. (f23|) . taking 6 = 3, J = 3, (oi, 02 , 03 ) = (1,2, 3), and (( 71 , 92 , Qs) = (4, —3,1). 

3.3.2 Bivariate generating function for the integers (/in,m)n>o,m>i 

Given t, the polynomial nature of in nr enabled us to obtain a generating function for We now use 

the iterative procedure in eq. (1171) to determine an equation that characterizes the bivariate generating function 
with coefficients hn,m- We represent each label of the form [n,m) by a symbolic algebraic expression in the 
variables y and z, so that {n,m) is replaced by z^y"^. Let L = be the multiset of all m-rooted histories 

for all trees Considering y and z as complex variables in two sufficiently small neighborhoods of 0, we aim 
to characterize the bivariate function F{y,z) that admits the expansion 

F{y,z)= 

{n,m)GL 


where the sum is over all labels in the multiset L and thus has a term for each m-rooted history of each In 
particular, the function F{y, z) is the bivariate generating function of the integers hn^m-, and its Taylor expansion 
can be written as 

CO CO 

F(y,z) = (25) 

m=l n=0 

where the coefficients hn^m appear explicitly. 

By differentiating F{y, z) with respect to y and then taking y = 0, we obtain 


dF 

dy 


( 0 , 2 ;) = y^^hn,iz'^. 

n=0 


(26) 


Thus, for each n > 0, we have 

hn = hn,i = [z^](^^{0,z)y 

By representing each label of the form (n, m) by the symbolic expression z'^y^ and assuming the complex vari¬ 
ables y and z are sufficiently close to 0, the recursive generation in eq. (fT71) of the multisets of labels Lq, Li, L 2 , ■ ■ ■ 
determines an equation for F{y,z), demonstrated in Appendix I: 


F{y,z) 


y(i - y) 


dF 

g{y) - z—{{),z). 

dy 


(27) 


Eq. (|27l) holds if the complex variables y and z are in two sufficiently small neighborhoods of 0, and it characterizes 
the generating function F{y, z). 


3.3.3 Generating function for the sequence (/in,i)n>o 

We now have an equation satisfied by the bivariate generating function F{y,z). Further, we have eq. (|26p . 
which demonstrates that the desired generating function for the sequence {hn)n>o is obtained from ^(0, z). By 
applying the kernel method [Hill], we can determine the power series §|-(0, z) from eq. (|27ll . 

The idea of the method consists of coupling the two variables (z,y) as (z,y(z)) in such a way that two 
conditions hold. First, (i) substituting?/ = y{z) cancels the kernel of the equation, that is, the factor 1—z/[y(l—y)] 
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on the left-hand side of eq. (EZD. Second, (ii) for z near 0, the value of y{z) remains in a sufficiently small 
neighborhood of y = 0, so that eq. (f2711 still holds near z = 0 after substitnting y = y{z). This condition is 
required, as the power series expansion in eq. (|25l) for F{y,z) has been assumed to be valid in a neighborhood 
of {y,z) = (0,0), and the derivation of eq. (lTf|l relies on the fact that y and z are sufficiently close to 0. If the 
two conditions hold, then 

so that g{y{z)) must be a power series for z = 0, because so must be z^{0,z). 

The required substitution couples y and z in such a way that 1 — z/\y{l — y)] = 0, so that y{z) = (1 ± 
\/l — 4 z)/2. To determine whether to take the negative root yi{z) or the positive root y 2 {z), we note that if 
is near 0 , then yi{z) approaches 0 , so that yi{z) lies in a neighborhood of y = 0 and g{yi{z)) admits a power 
series expansion for z near 0. For y 2 iz), however, if z is near 0, then y 2 {z) approaches 1, and thus, g{y 2 {z)) is 
not a power series for 2 : near 0 due to the pole of the function g{y) at y = 1 (Proposition [2|). The only solution 
satisfying both (i) and (ii) is consequently 


Y{z) = yi{z) = ^ (28) 

which, with the generating function C{z) of the Catalan numbers as in eq. ([3]), satisfies Y{z) = zC{z). Snbsti- 
tuting y = Y{z) in eq. ([271) . we have ^{0,z) = g(Y{z))/z, yielding the following result. 

Proposition 3 Fix tree t. Let g{y) be the generating function assoeiated with the polynomial /io,m (eg- 
Let Y{z) he as in eq. flSI) . Then the generating function f{z) = Yl’(((=o^nZ^ is given by 


dF 

fiz) = ^{0,z) = 


giYiz)) g 


■l-VlZZ4z^ 
2 2 


(29) 


The proposition thus determines the generating function f{z) = g{Y {z))lz for the integer sequence describing 
the number of matching coalescent histories of the species trees in the caterpillar-like family (t^"'^)n>o- The 
function g depends on the seed tree t, whereas the function Y (z) is fixed in eq. ([ 281 ) and does not depend on t. 

As an example, recall that for t = {C,D)), in eq. dM}, we have computed the generating function g 

for the number /io,m of m-rooted histories of t = t^^\ By Proposition [3l the generating function for the number 
hn of matching coalescent histories of is 

g( i-v?^ ) ^ 4(1 - ^/^^4i)(3 - z + VT^) 

Z z{l + Vl — 42)3 


00 

/(^) = X] = 


Taking the Taylor expansion of f, we obtain 


f{z) = A + 13z + 42^2 + 13823 + 4522 ^ -h 15732*^ -h 54342® -h 190002^ -h 671842® -F ... (30) 


The coefficients hn accord with the enumeration of matching coalescent histories reported in Corollary 3.9 of 
|12j and Table 3 of m for caterpillar-like families with seed tree t = except that those results 

tabulated numbers of coalescent histories by the number of taxa, whereas here, we use the index of the caterpillar¬ 
like family. Thus, in this example, the coefficient of 2 "' gives the number of matching coalescent histories for 
a tree with n -|- 4 taxa, as |f| =4. Shifting the index in the formula from dans] to agree with our indexing 
scheme, we obtain [(5(n -|- 4) — 12)/(4(n -|- 4) — 6 )]c(„_|_ 4 )_i = [(5n -|- 8 )/(4re -|- 10 )]cn +3 for the number of matching 
coalescent histories of This formula gives precisely the coefficients in the Taylor expansion in eq. (I30p . 
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3.4 Asymptotic behavior of 

From Proposition [3l we have the generating function / that counts the number of matching histories of for 
a given fixed seed tree t. Applying techniques of analytic combinatorics as introduced in Section 12.31 we can 
determine the asymptotic behavior of the coefficients of the generating function 

OD 

Rz) = '^hn-iz'^ = zf{z) = g{Y{z)), (31) 

n=l 

with Y(z) as in eq. (I28jl . To simplify notation, we work with / instead of /. 

First, in Section 13.4.11 we obtain an asymptotic equivalence between hn and PtCn, where /?* is a constant 
depending on the seed tree t, and the c„ are the Catalan numbers (eq. ([I])). Next, in Section r3.4.21 we produce 
a general procedure to determine the constants I3t, employing this procedure to obtain the values of f3t for all 
seed trees t with \t\ < 9. We demonstrate that our values of f3t accord with constant multiples of the Catalan 
numbers previously obtained according to a different method [TH] for seed trees with \t\ <8. 

3.4.1 A general asymptotic result 

Recall that given t, Proposition [2] gives a procedure to determine the rational function g in eq. (j31|) . Writing g as 
the finite linear combination in eq. (I23|) . the values of 6, J, and the (aj)i<j<j and {qj)i<j<j can all be computed. 

As noted in Section [2.31 the expansion of / at its dominant singularity characterizes the asymptotic behavior 
of the coefficients hn-i- In Appendix 2, we obtain the expansion of / at the dominant singularity z = ^, 

f(z) = at + /3t(-^^^^)±0(l-4z) (32) 

~ + (33) 

with 

,7 

at = (34) 

7=1 
J 

Pt = + b)qj. (35) 

7=1 

Note that in eq. (f32]l . the seed tree affects only the constants at and I3t computed in eqs. (I3H) and (l3^ 
from g, as written in the linear combination in eq. (j23p . Excluding the constant at that does not influence the 
asymptotic behavior of the coefficients, the main term of the expansion of f{z) (eq. (1331) i is the product of the 
constant Pt and the generating function —Vl — 4z/2, whose nth coefficient is the Catalan number c^-i (eq. (111) ). 

Theorem VI.4 of [ 7 ] indicates that under conditions satisfied by /, the asymptotic coefficients of a generating 
function as n —)• oo are obtained from the expansion of the function at the dominant singularity; moreover, the 
error term in the asymptotic coefficients can be computed from the error term in the singular expansion. Applying 
the theorem to the expansion in eq. ([32]) . we obtain the asymptotic behavior of the coefficients [z'^]f{z) = hn-i- 

Proposition 4 For any seed tree t, when n —)• oo, the number hn of matching coalescent histories for satisfies 

hn-l = [Z^]f{z) ~ / 3 t[z^] ( - ^ ^ (S) = ^ ^ (S) ’ 

where fit is a constant that depends on t. The constant fit is computed in eq. fill) once the function g, which is 
defined in eq. m, has been written as the linear combination in eq. \2!^) . 
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We immediately obtain the following corollary, corresponding to our initial claim in eq. ([6]). 
Corollary 1 For any seed tree t, there exists a eonstant /?* > 0 (eq. /T^) ) such that when n —>■ oo, 

hn ~ l3tCn. 


( 37 ) 


Proof. The result follows from Proposition 0] by noting that if [3t > 0, then 


. 

n^oo j3tCn-l PtCn-1 


lim 


Note that we are claiming > 0. From the definition of fit as the sum in eq. (1351) . because the qj are 
permitted to be negative, it is not immediately clear that jit > 0. Proposition |T] eliminates the possibility that 
fit is negative, as hn-i is necessarily positive. To show that fit / 0, we note that by eq. dMI), /3t = 0 would give 


1 



(38) 


so that hn-xjjr?) would remain bounded by a constant as n —oo. 

We now apply the lower bound hn > Cn+i from eq. dZD- By eq. dZD, we have 

hn—X ^ _ 's/n Cn 

yTr 4"-/(n^/2y'T) 

As n ^ oo, ^Jnj^ diverges to oo, while c„/[4’^/(n^/^v^)] converges to 1 by eq. ©. Therefore, the sequence 
hn-xl{F^(n^) niust diverge and eq. (f38]) cannot hold. Thus, /3t 7^ 0. □ 

As an example of Corollary [II consider t = ((A, B), (C, i4)). By decomposing the function g expressed in 
eq. (1M|) as in eq. ([23]) . we have already obtained the parameters 5, J, {aj)x<j<j-, and {qj)x<j<j in Section (3.3.11 
Therefore, computing fit as in eq. ([35]) . we obtain 

= 2^+3"^(1 + 3)(4) + 2^+^"2(2 + 3)(-3) + 2^+3"3(3 + 3)(1) = 80. 

Eq. (1371) then produces ~ 80c„,. Note that the limit hn ~ |cn+3 produced for this tree from hn = [(5n + 
8)/(4n + 10)]c„,+3 in Section 13.3.31 agrees with the limiting result hn ~ 80cn. Recalling eq. dlj), 

^ ^ 5n + 8 Cn+3 ^ 5 (^n+^)/{n + 2>) ^ 5^3 ^ 

Cn 4n + 10 Cn 4 (2n^/(r^ + l) 4 


3.4.2 Determining fit from the seed tree t 

We have shown in Corollary [T] that the number of matching coalescent histories hn for the caterpillar-like family 
tF^'i is, for a constant fit, asymptotic to fitCn- We can now assemble our results to describe a procedure that given 
a seed tree t with |t| > 2 determines both the generating function with coefficients hn and the constant fit- 

(i) Determine by eq. Q the polynomial et^k in fe > 0 that counts the number of fc-extended histories of t. 

(ii) Compute from eq. ([8|) the polynomial in m that counts for m > 1 the number of m-rooted histories of t. 

(iii) Obtain the generating function g{y) = with coefficients /io,m by using Proposition [2j 

(iv) Determine the generating function f{z) = with coefficients hn by applying Proposition (3] 
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(v) Write g{y) as a linear combination according to eq. (I23|) . determining the values of b, J, and the aj and qj. 

(vi) Compute the asymptotic constant j3t from eq. ([35]). 

We have programmed this procedure in Mathematica; starting from a given seed tree t, our program 
CatFamily.nb can automatically compute for the caterpillar-like family the generating function with co¬ 
efficients hn and the asymptotic constant I3t. Using this program, we have determined the value of /3t for each 
seed tree with 9 taxa, collecting the results in Table [H 

Recall that Rosenberg m reported the asymptotic constant multiples of the Catalan numbers, , which 
represent the asymptotic numbers of coalescent histories for seed trees with up to 8 taxa, indexing the results by 
the number of taxa m rather than by the index n of the caterpillar-like family. Also recall that for seed tree t, 
tree has m = |t| -|-n taxa (Fig. [T|). In the notation of [T3|, writing At^^i as the number of matching coalescent 
histories in the caterpillar-like tree with seed tree t and m > |t| taxa, we have hn = 

By eq. ([5|), we have the asymptotic equivalence c„ ~ c„_|_fc/4^ for each positive integer k. Therefore, 

~ f^tCn ~ ^t\-l ^ri+\t\-l — Pt^rn-li (39) 

where the asymptotic constant Pt of Corollary [T] is normalized to obtain 

ft* = Jk- f"* 

This computation converts the asymptotic constant multiple /?t of c„ into a corresponding multiple of Cm-i, 
as reported in |13j for small trees. Comparing Tabled) with Table 3 of |13j . we see that for the cases examined 
by [I3], the values of /3j we compute from the associated Pt agree with the values that were previously reported. 

4 Conclusions 

In this paper, we have solved a problem left open by m on determining the number of coalescent histories for 
gene trees and species trees that have a matching labeled topology and that belong to a generic caterpillar-like 
family. We have proven that for any seed tree t, the integer sequence {hn)n>o, whose nth element represents 
the number of matching coalescent histories of the caterpillar-like tree grows asymptotically as a constant 
multiple of the Catalan numbers, that is, hn ~ l^tCn, where the constant term Pt > 0 depends on the shape of the 
seed tree t. Rosenberg [13] had previously obtained this result for seed trees with at most 8 taxa; here, by using 
a succession rule for recursive enumeration and then applying techniques of analytic combinatorics, we have not 
only proven the existence of the constant j3t for seed trees of any size, we have also produced a procedure that 
computes the constant j3t, as well as the expression for the generating function of the integers {hn)n>o- 

The numerical results on the constants f3t extend the empirical observation of m that the caterpillar-like 
families that produce the largest numbers of matching coalescent histories are those whose seed tree has a 
high level of balance. By extending from seed trees with |t| < 8 taxa to those with |t| = 9, we observe that 
the constants /3t for the caterpillar-like families with the largest and smallest numbers of matching coalescent 
histories become further separated, so that for n large, many more coalescent histories exist by which a gene tree 
can match the species tree for some species trees than for others. For the 9-taxon seed tree with the largest (3^, 
(3^ PS 8.12 compared to /3t = 1 for the seed tree with the smallest . Our procedure for evaluating f3t and as 
a function of the seed tree can now enable further systematic analyses of the correlates of the constants /3i and 
/3^, to facilitate additional explorations of determinants of the numbers of matching coalescent histories. 

Nevertheless, although the constants Pt and do depend on the seed tree, we have shown that all caterpillar¬ 
like families are asymptotically equivalent in their numbers of matching coalescent histories up to a constant 
factor. Thus, in considering large trees, the many caterpillar branches contribute to the asymptotic growth 


17 




Table 1: Asymptotic constants /3t with ~ PtCn, for seed trees t with 9 taxa. 


Seed tree t 

pt 

Pt 

Seed tree t 

pt 

Pt 


65,536 

1 


128,864 

4,027/2,048 


81,920 

5/4 


166,624 

5,207/2,048 


94,208 

23/16 

xi'^X 

197,296 

12,331/4,096 


104,448 

51/32 

X^X 

224,704 

3,511/1,024 


138,240 

135/64 

x>^X 

308,576 

9,643/2,048 


118,784 

29/16 

x/'^X 

262,000 

16,375/4,096 


113,408 

443/256 

x^x\^x. 

250,272 

7,821/2,048 


148,480 

145/64 

X^^Xv 

339,504 

21,219/4,096 


177,664 

347/128 

xXx 

417,632 

13,051/2,048 


141,312 

69/32 


326,240 

10,195/2,048 


193,536 

189/64 

xXx 

464,128 

1,813/256 


121,472 

949/512 


182,912 

1,429/512 


157,888 

2,467/1,024 


243,904 

3,811/1,024 


187,776 

1,467/512 


296,064 

2,313/512 


214,720 

3,355/1,024 


344,512 

5,383/1,024 


296,192 

1,157/256 

x'^>5x^^Xs 

487,808 

3,811/512 


251,136 

981/256 


410,112 

801/128 


162,560 

635/256 


214,016 

209/64 


219,136 

107/32 

xCXv 

306,112 

4,783/1,024 


268,288 

131/32 

x^CX. 

294,784 

2,303/512 


177,664 

347/128 

xXX. 

425,216 

1,661/256 


249,344 

487/128 


366,720 

2,865/512 


353,536 

1,381/256 

xC^xTxXs 

532,224 

2,079/256 


Values of j3t appear for each of the 46 unlabeled species trees with 9 taxa. For each species tree t, we also provide the 
constant j3^ = /3t/4® (eq. (l40l) '). Trees are listed in increasing order by rank as defined in Section 2 of m- In the left 
column, each seed tree t belongs to a caterpillar-like family with |<| < 9. In these cases, we recover the values of /3(* 

as determined in Table 3 of ca¬ 
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behavior of the number of matching coalescent histories—which follows a multiple of the Catalan numbers—and 
the seed tree contributes only to the constant by which the Catalan numbers are multiplied. From the viewpoint 
of computational complexity in evaluating gene tree probabilities according to formulas that sum over matching 
coalescent histories [4j, all caterpillar-like families have the same growth pattern up to a constant. 

The extent to which other tree families follow the Catalan sequence in their numbers of matching coalescent 
histories remains unknown, though we have recently found a family, the lodgepole family, for which the number 
of matching coalescent histories grows faster than with a constant multiple of the Catalan numbers [5]. The 
use of our substantially different approach employing analytic combinatorics opens new methods for theoretical 
analysis of coalescent histories and can potentially assist in understanding when Catalan-like growth, the rapid 
growth of the lodgepole family, and intermediate or perhaps still faster growth patterns will apply. 


Appendix 1. The equation satisfied by F{y,z) 

In this appendix, we complete the derivation of eq. (1271) satisfied by F{y,z). In the generating function F{y,z) 
(eq. (j25ll ). each monomial z^y^ corresponds to a label {n,m) G that in turn represents an m-rooted history 
of Recall that the multisets of labels Lq, Li, L 2 , ■ ■ ■ (eq. (fTTll l can be iteratively generated according to 
eq. (I17h through the operator Q defined in eq. (jl3h . starting from the multiset Lq. Also recall that by considering 
the multiset of labels L = U^qL^, we can write F{y,z) = F{nm)&L iterative generation of 

the family of multisets {Ln)n>o to obtain an equation for F. 

By eq. ([13]), for n > 0 and m > 2, for each occurrence in of a label (n, m), a copy of each label in set 

n((n, m)) = {(n l,m + j) : j > —1} 


belongs to the multiset L^+i- Thus, in algebraic terms, each time that an expression z'^y'^ with re > 0 and rre > 2 
is counted in the generating function F —written G T in what follows—the terms z'^^^ appear 

in F as well. Summing over all possible G F with re > 0 and m > 2, we obtain 

A = I E (^"!/’"E!t)- (“D 

1 ^ ^ z^y^&F :n>0,m>2 ^ j=0 ^ 

Similarly, for re > 0 and rre = 1, for each occurrence in of a label (re, 1), a copy of each label in set 
Il((re, 1)) = {(re -|- l,j) : j > 1} appears in the multiset Ln+i. Thus, for each term z'^y G F, with re > 0, the 
terms counted in F as well. Summing these terms for all z'^y G F with re > 0, we obtain 

E (^”«E!f') = ^!/ E (*"E!^)^ 

z^yGF:n>0 ' j=l ^ z^ySF:n>0 ^ j=0 ^ 

Notice that the sum of the expressions in eqs. (1411) and (|42l) is the algebraic representation of the multiset 
of labels L \ Lq. More precisely, each term G F associated with a label {n,m) G L^, with re > 1 , is 

counted—and counted exactly once—in the sum of eqs. dlT]) and (112]) . Therefore, to complete the description of 
F, we require only those terms z^y'^ associated with labels (0, rre) G Lq. These terms are represented by 

OD 

^ z%^=Y,ho,my^ = g{y), (43) 

(0,m)eLo rn=l 


E 


znym^F:n>0,m>2 


yn+1 


E 

j=m- 


considering that /ro,m = |{^ G Tq : = (0 ,rre)} (eq. (fT5]) ) and that by definition, g{y) = Ylm=i ^o,my^ (eq. (fT8l) l. 
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We can now equate the full generating function F{y,z) to the sum of eqs. (03]), (HTI) . and (02]), obtaining 


Fiy,z) = g{y) + - 

y 


E 


z'^y'^^F: n>0,m>2 


j=0 


OO 

ym^yj\+^y ^ ^ 


z'^yGF:n>0 ^ ^=0 


Applying the fact that YlT=oV^ = 1/(1 ~ 2 /) for y near 0 in the complex plane, we then have 


F{y,z) = g{y) + 


y(i - y) 


E 

z'^y'^^F : n>0,m>2 


z^y^ ) + 


1 -y 


E 

z'^ydF : n>0 


(44) 


(45) 


By eq. (1251) and the fact that the multisets of labels (n, m) for m-rooted histories of have hn,m elements, 

OF 


E = 

z'^ySF : n>0 


dy 


(0,z) 


E 

z'^y'^GF : n>0,m>2 




[ ^ z^yA-f z^y']=F{y,z)-y^{0,z). 


z'^yGF : n>0 


Substituting in eq. (05]l . the last two expressions yield 


F{y,z) = g{y) + 

which can be rewritten as in eq. (|27ll . 


y(i - y) 


Fiy,z) - y^(0, z)] + ^^ (0, z), 

dy J I- y dy 


(46) 


Appendix 2. The dominant singularity and singular expansion of f{z) 

This appendix obtains the singular expansion of f{z) described in eq. (1321) . In eq. (1311) . we have defined f{z) 
as a composition f{z) = g{Y(z)), with the internal function Y{z) as in eq. ()28p and the external function g{y) 
as in eq. (I23p . Owing to the presence of the square root in the expression for Y{z), the dominant singularity of 
the internal function Y{z) —the singularity nearest the origin of the complex plane—is aX z = j. Computing the 
value of Y{z) at its dominant singularity, we obtain Y{^) = ^. In particular, we have E(|) < 1, where 1 is the 
radius of convergence of the finite series corresponding to the external function g in /. Indeed, it immediately 
follows from Proposition [2| that y = 1 is the dominant singularity of y(y). 

As detailed in Section VI.9 of [7], on dominant singularities of compositions, we are in the setting of the 
subcritical case, in which the inequality V(|) < 1 implies that the dominant singularity of g{Y{z)) coincides 
with the dominant singularity z = j of the internal function Y(z) rather than the dominant singularity y = 1 of 
the external function y(y). The desired singular expansion of f{z) = g{Y{z)) at the dominant singularity z = j 
can be obtained by inserting y = Y{z) in the regular (non-singular) expansion of y(y) at y = V(i) = i. 

To recover the expansion of y(y) at y = ^, we expand and then sum each term qj[y°‘i/{I — y)^] of the finite 
linear combination in eq. (I23p . At y = each of these terms is an analytic function, and we can thus use Taylor’s 
formula to produce the desired expansion. We obtain at y = ^ 

(Tv/y = (aj + bhi (s' - 0 ± o((!' - ^ 

By summing over the indices 1 < j < J of eq. dMj), the expansion of y(y) at y = ^ is 

9 {y) = ai + /3t (y - ± O((y - , (47) 

with the constants at and f3t defined as in eqs. (IMll and (1351) . Plugging y = Y{z) from eq. (1281) into eq. (1471) . we 
finally obtain the singular expansion of f{z) at 2 = | as in eq. (1321) . 
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